# Tutorial 2: Annotating the *In situ* metabonomics data ```python import os import numpy as np from scipy.stats import gaussian_kde from CONTINUED.annotation import * import pandas as pd os.chdir('/data/yuchen_data/desi_scripts/data/annotation_data/result/') work_dir = '/data/yuchen_data/desi_scripts/data/annotation_data/combined' output_prefix = '/data/yuchen_data/desi_scripts/data/annotation_data/result/colon_cancer_desi_' input_lipid = '/data/yuchen_data/desi_scripts/data/annotation_data/20210930.Lipid.8_samples.uniq.txt' input_small_mol = '/data/yuchen_data/desi_scripts/data/annotation_data/20220107.combined.small_molecule.neg.uniq.txt' input_sample_list = '/data/yuchen_data/desi_scripts/data/annotation_data/sample.list.selected.txt' mass_cutoff = 0.02 ``` ### Step1: Parse DESI data and LC-MS data ```python sample_mass, mass_sample, mass = Parsing_Mass_Table(input_sample_list, work_dir) ``` ```python lipid = Parsing_Lipid(input_lipid) small_mol = Parsing_Small_Molecule(input_small_mol) ``` ### Step2: Generate a file named 'mass_dis_in_samples.txt' ```python output_sample_mass = 'mass_dis_in_samples.txt' Print_Mass_Diff_By_Samples(sample_mass, output_sample_mass) ``` ### Step3: Utilize kde to clustering all m/z ```python mass_index_group = Group_Mass(mass, lipid, small_mol, mass_cutoff) mass_clustered = Clustering_Mass_by_KDE(mass_index_group, lipid, small_mol, mass_cutoff) ``` ### Step4: Generate the file 'colon_cancer_desi_.clustered_mass.table.with.anno.txt' that recoded the annotation information for all m/z across all samples ```python Print_Clustered_Mass_By_Sample(mass_clustered, mass_sample, lipid, small_mol, output_prefix) ``` ```pythoN # Each row represents an LC-MS annotated metabolite, each column represents a sample, and each cell indicates whether an m/z value in that sample has been annotated as the corresponding metabolite. If it has, the cell value is the m/z for that sample; if not, the cell value is NaN. df = pd.read_csv('colon_cancer_desi_.clustered_mass.table.with.anno.txt', index_col=0, sep='\t') df.head() ```
| ST06_20210716 | ST06_20211019 | ST08_20211019 | ST103_20210718 | ST109_20210330 | ST114_20210730 | ST118_20211222 | ST121_20210806 | ST124_20211223 | ST129_20201210 | ... | ST73_20210728_mass | ST73_20210729_mass | ST84_20211223_mass | ST87_20210331_mass | ST88_20210331_mass | ST91_20210406_mass | ST98_20210715_mass | ST98_20210804_mass | anno_lipid | anno_small_mol | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Index | |||||||||||||||||||||
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 71.0133;C3 H4 O2;H;Acrylic acid |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 74.02421;C2 H5 N O2;H;Glycine |
| 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 78.91830999999999;H Br;H;Hydrogen bromide |
| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 79.95662999999999;None;H;None |
| 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 66 columns